AnnCorra: Building Tree-banks in Indian Languages

نویسندگان

  • Akshar Bharati
  • Rajeev Sangal
  • Vineet Chaitanya
  • Amba P. Kulkarni
  • Dipti Misra Sharma
  • K. V. Ramakrishnamacharyulu
چکیده

This paper describes a dependency based tagging scheme for creating tree banks for Indian languages. The scheme has been so designed that it is comprehensive, easy to use with linear notation and economical in typing effort. It is based on Paninian grammatical model. 1.BACKGROUND The name AnnCorra, shortened for "Annotated Corpora", is for an electronic lexical resource of annotated corpora. The purpose behind this effort is to fill the lacuna in such resources for Indian languages. It will be an important resource for the development of Indian language parsers, machine learning of grammars, lakshancharts (discrimination nets for sense disambiguation) and a host of other such tools. 2. AIMS AND OBJECTIVE The aim of the project is to : develop a generalised linear syntactosemantic tag scheme for all Indian languages annotate training corpus for all Indian languages develop parallel tree-banks for all Indian languages To fulfill the above aim a marathon task a collaborative model has been concieved. Any collaborative model implies involvement of several people with varying levels of expertise. This case, becomes further complicated as the tag scheme to be designed has to be equally efficient for all the Indian languages. These languages, though quite similar, are not identical in their syntactic structures. Thus the tag scheme demands the following properties :comprehensive enough to capture various sysntactic relations across languages. simple enough for anyone, with some background in linguistics, to use. economical in typing effort (the corpus has to be manually annotated). 3. AN ILLUSTRATION The task can be better understood with the help of an illustration. Look at the following sentence from Hindi 0:: rAma ne moHana ko 'Rama' 'ErgPostP' 'Mohan' 'PostP' nIlI kitAba dI 'blue' 'book' 'gave' 'Rama gave the blue book to Mohan.' Tree-1 is a representation of the above verb, argument relationship within the various constituents of sentence 0 dI ------------------------| | | k1 | k4| k2| | | | rAma_ne moHana_ko kitAba | |nmod | nIlI

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Classification and Identification of Telugu Handwritten Characters Extracted from Palm Leaves Using Decision Tree Approach

Research in character recognition is very popular for various application potentials in banks, post offices, defense organizations, reading aid for the blind, library automation, language processing and multi-media design. Even though Epigraphical work dealing with stone inscriptions have been analyzed, these have been done largely manually and also on 2D traces. A large collection of these are...

متن کامل

Performance of Credit Risk Management in Indian Commercial Banks

For banks and financial institutions, credit risk had been an essential factor that needed to be managed well. Credit risk was the possibility that a borrower of counter party would fail to meet its obligations in accordance with agreed terms. Credit risk; therefore arise from the bank’s dealings with or lending to corporate, individuals, and other banks or financial institutions.  Credit risk...

متن کامل

Statistical Investigation and Comparative Assessment of the Non-Performing Assets of Indian Commercial Banks

Non-performing assets (NPAs) have been a major cause of concern for Indian commercial banks in the recent past years. Many studies have been reported on the different aspects of NPAs in Indian banking system. However, there is a crucial lack of investigation on the comparative assessment of various types of banks such as Public sector banks, Private sector banks and foreign banks so that the tr...

متن کامل

Evaluation of performance of Indian Banks by using CAMEL AND G R A Techniques.

In the paper an attempt was made to study the performance of Indian Banks with the help of CAMEL(C-Capital Adequacy, A- Asset Quality, M- Management Quality, E-Earnings, L-Liquidity i.e. CAMEL) rating system, over the period of five years from (2011 to 2015) and then evaluated out the efficiency of banks with the help of GRA (Grey Relation Analysis) technique there by gaining confidence from in...

متن کامل

The Effects of Corporate Governance on Banks’ Performance (Evidence from of Indian Banks)

The aim of this study was an investigation of the effect of corporate governance on banks’ performance evidence from Indian Banks. This study tested a hypothesis according to the three levels of a model with three groups including the overall, public, and private sectors. This hypothesis focused on the relationship between different variables of the three levels in the new model of bank perform...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002